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Abstract 

In many wireless communication systems, radios are subject to a duty cycle constraint, that is, a 
radio only actively transmits signals over a fraction of the time. For example, it is desirable to have a 
small duty cycle in some low power systems; a half-duplex radio cannot keep transmitting if it wishes 
to receive useful signals; and a cognitive radio needs to listen and detect primary users frequently. This 
work studies the capacity of scalar discrete-time Gaussian channels subject to duty cycle constraint as 
well as average transmit power constraint. An idealized duty cycle constraint is first studied, which 
can be regarded as a requirement on the minimum fraction of nontransmissions or zero symbols in 
each codeword. A unique discrete input distribution is shown to achieve the channel capacity. In many 
situations, numerically optimized on-off signaling can achieve much higher rate than Gaussian signaling 
over a deterministic transmission schedule. This is in part because the positions of nontransmissions 
in a codeword can convey information. Furthermore, a more realistic duty cycle constraint is studied, 
where the extra cost of transitions between transmissions and nontransmissions due to pulse shaping is 
accounted for. The capacity-achieving input is no longer independent over time and is hard to compute. 
A lower bound of the achievable rate as a function of the input distribution is shown to be maximized 
by a first-order Markov input process, the distribution of which is also discrete and can be computed 
efficiently. The results in this paper suggest that, under various duty cycle constraints, departing from 
the usual paradigm of intermittent packet transmissions may yield substantial gain. 

This work has been presented in part at the 2011 and 2012 IEEE International Symposium on Information Theory. 
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I. Introduction 

In many wireless communication systems, a radio is designed to transmit actively only for a fraction 
of the time, which is known as its duty cycle. For example, the ultra-wideband system in [ 1 ] transmits 
short bursts of signals to trade bandwidth for power savings. The physical half-duplex constraint also 
requires a radio to stop transmission over a frequency band from time to time if it wishes to receive useful 
signals over the same band. Thus wireless relays are subject to duty cycle constraint, so do cognitive 
radios which have to listen to the channel frequently to avoid causing interference to primary users. The 
de facto standard solution under duty cycle constraint is to transmit packets intermittently. 

This work studies the fundamental question of what is the optimal signaling for a Gaussian channel 
with duty cycle constraint as well as average transmission power constraint. An important observation 
is that the signaling in nontransmission periods can be regarded as transmission of a special zero signal. 
We first make a simplistic and idealized assumption that the analog waveform corresponding to each 
transmitted symbol is exactly of the span of one symbol interval. We restrict our attention to discrete- 
time scalar additive white Gaussian noise (AWGN) channels for simplicity, where the duty cycle constraint 
is equivalent to a requirement on the minimum fraction of zero symbols in each transmitted codeword, 
which is called the idealized duty cycle constraint. We then consider the case where a practical pulse 
shaping filter is used, e.g., for band-limited transmissions. As such, during a transition between a zero 
symbol and a nonzero symbol, the pulse waveform of the nonzero symbol leaks into the interval of 
the zero symbol. A realistic duty cycle constraint must include the extra cost incurred upon transitions 
between zero and nonzero symbols. The mathematical model of the preceding input-constrained channels 
is described in Section [Fj] 

Determining the capacity of a channel subject to various input constraints is a classical problem. It 
is well-known that Gaussian signaling achieves the capacity of a Gaussian channel with average input 
power constraint only. In addition, Zamir (2j shows that the mutual information rate achievable using a 
white Gaussian input never incurs a loss of more than half a bit per sample with respect to the power 
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constrained capacity. Furthermore, Smith (3j investigated the capacity of a scalar AWGN channel under 
both peak power constraint and average power constraint. The input distribution that achieves the capacity 
is shown to be discrete with a finite number of probability mass points. The discreteness of capacity- 
achieving distributions for various channels, including quadrature Gaussian channels, and Rayleigh-fading 
channels is also established in (4|-(9|. Chan |10| studied the capacity-achieving input distribution for 



conditional Gaussian channels which form a general channel model for many practical communication 
systems. Until now, the impact of duty cycle constraint on capacity-achieving signaling is underexplored 
in the literature. 



The main results of this paper are summarized in Section III In the case of the idealized duty cycle 
constraint, because all costs associated with the constraints can be decomposed into per-letter costs, the 
optimal input distribution is independent and identically distributed (i.i.d.). We use a similar approach as 



in (3j and [ 10 1 to show that the capacity-achieving input distribution for an AWGN channel with duty 



cycle constraint and average power constraints is discrete. Unlike in (3j and [10|, the optimal distribution 



has an infinite number of probability mass points, whereas only a finite number of the points are found 
in every bounded interval. This allows efficient numerical optimization of the input distribution. 

The case of realistic duty cycle constraint is more challenging. Because the constraint concerns symbol 
transitions, the capacity-achieving input distribution is no longer independent over time, and becomes 
hard to compute. We develop a good lower bound of the input-output mutual information as a function 
of the input distribution. It is proved that, under the realistic duty cycle constraint, a first-order Markov 
process maximizes the lower bound, the distribution of which is also discrete and can be computed 
efficiently. The main theorems for the cases of idealized and realistic duty cycle constraints are proved 
in Section IV and [Vj respectively. 



We devote Section IVl] to the numerical methods and results. In order to compute the achievable rate 



when the input is a Markov Chain, a Monte Carlo method is introduced in Section VI-A to numerically 



compute the differential entropy rate of hidden Markov processes. Numerical results in Section VI-B 
demonstrate that in the case of idealize duty cycle constraint using a numerically optimized discrete 
signaling achieves higher rates than using Gaussian signaling over a deterministic transmission schedule. 
For example, if the radio is allowed to transmit no more than half the time, i.e., the duty cycle is no 
greater than 50%, a near-optimal discrete input achieves 50% higher rate at 10 dB signal-to-noise ratio 



(SNR). In the case of realistic duty cycle constraint, numerical results also show that the rate achieved 
by the Markov process is substantially higher than that achieved by any i.i.d. input. This suggests that, 
compared to intermittently transmitting packets using Gaussian or Gaussian-like signaling, it is more 
efficient to disperse nontransmission symbols within each packet to form codewords, which results in a 
form of on-off signaling. 

One of the reasons for the superiority of on-off signaling is that the positions of nontransmission 
symbols can be used to convey information, the impact of which is particularly significant in case of low 
SNR or low duty cycle. This has been observed in the past. For example, as shown in 1 11 1 (see also |T2j, 



fT3j), time sharing or time-division duplex (TDD) can fall considerably short of the theoretical limits in 
a relay network: The capacity of a cascade of two noiseless binary bit pipes through a half-duplex relay 
is 1.14 bits per channel use, which far exceeds the 0.5 bit achieved by TDD and even the 1 bit upper 
bound on the rate of binary signaling. 

Besides that duty cycle constraint is frequently seen in practice, another motivation of this study is 



a recent work [14|, in which on-off signaling is proposed for a clean-slate design of wireless ad hoc 
networks formed by half-duplex radios. Using this signaling scheme, which is called rapid on-off-division 
duplex (RODD), a node listens to the channel and receives useful signals during its own off symbols 
within each frame. Each node can transmit and receive messages at the same time over one frame interval, 
thereby achieving (virtual) full-duplex communication. Understanding the impact of duty cycle constraint 
is crucial to characterizing the fundamental limits of such wireless networks. 

II. System Model 

Consider digital communication systems where coded data are mapped to waveforms for transmission. 
Usually there is a collection of pulse waveforms, where each pulse represents a symbol (or letter) from a 
discrete alphabet. We view nontransmission over a symbol interval as transmitting the all zero waveform. 
In other words, a symbol interval of nontransmission is simply regarded as transmitting a special symbol 
"0," which carries no energy. 

As far as the capacity-achieving input is concerned it suffices to consider the baseband discrete-time 
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model for the AWGN channel. The received signal over a block of n symbols can be described by 



Yi = Xi + Ni (1) 

where i = 1, . . . , n, Xi denotes the transmitted symbol at time i and Ni, . . . , N n are independent standard 
Gaussian random variables. For simplicity, we assume no inter-symbol interference is at receiver. Each 
symbol modulates a continuous-time pulse waveform for transmission. If the width of all pulses were 
exactly of one symbol interval, which is denoted by T, the duty cycle is equal to the fraction of nonzero 
symbols in a codeword. In practice, however, the pulse is usually wider than T, so that the support of the 
transmitted waveform is greater than the sum of the intervals corresponding to nonzero symbols due to 
leakage into intervals of adjacent zero symbols. To be specific, suppose the width of a pulse is (1 + 2c)T, 
then each transition between zero and nonzero symbols incurs an additional cost of up to cT in terms 
of actual transmission time. 

Let 1— q denote the maximum duty cycle allowed. In this paper, we require every codeword (xi, X2, • • • , x r 
to satisfy 

1 n 1 / n_1 \ 

- £ + n 2c (Yl 1 {x,=o,x 1+1 ^o} + !{x„=(wo} I < 1 - q (2) 

i=l \i=l J 

where lr\ is the indicator function, and the transition cost is twice that of zero-to-nonzero transitions, 
because the number of nonzero-to-zero transitions and the number of zero-to-nonzero transitions is equal 
under the cyclic transition cost configuration. From now on, we refer to Q as duty cycle constraint 
(q,c). Note that the idealized duty cycle constraint is the special case (g,0). If c € [0, i], then the left 
hand side of ^ is equal to the actual duty cycle. If c > |, the left hand side of (|2]) is an overestimate of 
the duty cycle. Nonetheless, we use constraint ([2]) for its simplicity. In addition, we consider the usual 
average input power constraint, 



n 
i=l 



n — ^ 



In many wireless systems, the transmitter's activity is constrained in the frequency domain as well as 
in the time domain. In principle, the results in this paper also apply to the more general model where 
the duty cycle constraint is on the time-frequency plane. 
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III. Main Results 

A. The Case of Idealized Duty Cycle Constraint 

Let n denote the distribution of the channel input X. The set of distributions with duty cycle constraint 
(q, 0) and power constraint 7 is denoted by 

A( 7 , q) = {n : /i({0}) > q, {X 2 } < 7}. (4) 

It should be understood that \i is a probability measure defined on the Borel algebra on the real number 
set, denoted by B(R). 

Theorem 1: The capacity of the additive white Gaussian noise channel ([T]) with its idealized duty cycle 
no greater than 1 — q and the average power no greater than 7 is 

C(7, q) = max /(//) . (5) 

MeA(7,9) 

In particular, the following properties hold: 

a) the maximum of ([5]) is achieved by a unique (capacity-achieving) distribution fj® G A(7, <jt); 

b) |Uo is symmetric about and its second moment is exactly equal to 7; and 

c) fj,Q is discrete with an infinite number of probability mass points, whereas the number of probability 
mass points in any bounded interval is finite. 

The proof of Theorem [T] is relegated to Section IV Property (b) suggests that the capacity-achieving 



input always exhausts the power budget. Property (c) indicates that the capacity-achieving input can be 
well approximated by some discrete inputs with finite alphabet, which can be computed using numerical 



methods. The achievable rate of numerically optimized input distribution is studied in Section VI 



B. The Case of Realistic Duty Cycle Constraint 

In this paper, let X% denote the subsequence (Xk,Xk+i, • • • , X n ), where X%° = (X^, X^+i, • • • )■ We 
also use shorthand X n = X™. Let fj, denote the probability distribution of the process X\, X2, ■ ■ ■ . We 
use to denote the marginal distribution of Xj, and ^x it x to denote the joint probability distribution 
of (Xi,Xj). Denote the set of n-dimension distribution which satisfy duty cycle constraint (q,c) and 
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power constraint 7 by 

A n (7,9,c) 



f 1 - 

\v ■ ~ E - 2c Mx„x imo<I „ +1 ({0} x fR\{0}))] > ? , 

I i=l 



n 



where 

MX 4 ,Xj ({0} x (R\{0») = P(Xi = 0, X, ^ 0) (7) 
denotes the probability of a zero-to-nonzero transition and 



% mod n = < 



i, if 1 < i < n, 
0, if i = re. 



(8) 



For convenience in a subsequent proof, the duty cycle in Q is defined in a cyclic manner using the 
modular operation, where a transition between X n and X\ is also counted. This of course has vanishing 
impact as n — > 00 and thus no impact on the capacity. 

The capacity of the AWGN channel ([T]) with duty cycle constraint (q, c) and power constraint 7 is 

C(j, q, c) = lim — max I{X n ;Y n ). (9) 

n-5>oo n P x «GA"(7,g,c) 



The capacity is in fact achieved by a stationary input process. This is justified in Section V-A by 
showing that any nonstationary input process has a stationary counterpart with equal or greater input- 
output mutual information per symbol. Let us denote the set of stationary distributions which satisfy duty 
cycle constraint (q,c) and power constraint 7 by 



(10) 



A(7, q,c) = {fi: fi is stationary, { Xf} < 7, 

/i Xl ({0}) - 2c/i Xll x 2 ({0} x (M\{0») > q). 
Theorem 2: For any \i E A(j,q,c), let 

L(n) = I{X ] Y)-I{X l -Xf) (11) 

where I(X; Y) is the mutual information of the additive white Gaussian noise channel between the input 



symbol X, which follows distribution fixu an d the corresponding output Y. The following properties 
hold: 

a) L(fi) is a lower bound of the channel capacity; 

b) The maximum of L(-) is achieved by a discrete first-order Markov process, denoted by /i*; 

c) jj* satisfies the following property: Define Bi = ljx^o}) * = 1,2,.... Then for every i, conditioned 
on Bi and Bi+i, the variables Xi and X%+i are independent, and 

L(jj*)=I(X;Y)-I(B 1 ;B 2 ). (12) 



The proof of Theorem [2] is relegated to Section [Vj Evidently, increasing the input power by scaling 
the input linearly not only maintains its duty cycle, but also increases the mutual information. Therefore, 
the optimal input distribution must exhaust the power budget 7. 

IV. Proof of Theorem [T] (the Case of Idealized Duty Cycle Constraint) 

This section is devoted to a proof of Theorem [T] for the case of the idealized duty cycle constraint 
(q, 0). The conditional probability density function (pdf) of the output given the input of the AWGN 
channel ([T]) is 

py\x(v\x) = 4>(y - x) (13) 

where 

<j>(t) = -L=e"T (14) 
V 2vr 

is the standard Gaussian pdf. 

With the idealized constraint, the capacity of the AWGN channel is achieved by an i.i.d. process and 
the duty cycle constraint reduces to a per symbol cost constraint. For given input distribution /i, the pdf 
of the output exists and is expressed as 

Py{v; fj) = / PY\xiy\x) n(dx) = {cj)(y - X)} . (15) 
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Denote the relative entropy D (py\x{'\ x )\\py [•', A*)) by d(x; /x), which is expressed as 

/°° Py\x{d\ x ) 
PY\x{y\x)\og -dy . (16) 
-do PY{y;n) 

The mutual information I(/x) = I(X;Y) is then 

/(/x) = j d(x- p) fi(dx) = E M {d(X; /x)} . (17) 

The capacity of the AWGN channel under per-letter duty cycle constraint and power constraint is 
evidently given by the supremum of the mutual information I(/x) where /x G A (7, q). The achievability 
and converse of this result can be established using standard techniques in information theory. 



The proof of property (a) is presented in Section IV-A Now suppose (jlq is the unique capacity- 
achieving distribution, property (b) is established as follows. Since the mirror reflection of /xo about is 
evidently also a maximizer of ([5]), the uniqueness requires that /xo be symmetric. Note that linear scaling 
of the input to increase its power maintains its duty cycle and cannot reduce the mutual information, 
as the receiver can add noise to maintain the same SNR. By the uniqueness of the maximizer /xq, the 
power constraint must be binding, i.e., the second moment of /xo must be equal to 7. In order to prove 



property (c), we first establish a sufficient and necessary condition for /xq in Section IV-B and then apply 



it to show the discreteness of [iq in Section IV-C 



A. Existence and Uniqueness of /xo 

Let V denote the collection of all Borel probability measures defined on (R, Z3(R)), which is a 
topological space with the topology of weak convergence p3| . We first establish the following lemma. 
Lemma 1: A(7, q) is compact in the topological space V. 

Proof: According to [15], the topology of weak convergence on V is metrizable. Therefore, by 



Prokhorov's theorem [16], in order to prove that A(j,q) is compact in V, it suffices to show that it is 
both tight and closed. 

For any e > 0, there exits an a f > 0, such that for all /x G A 7 , 

H{\X\ > a € ) < Ml 2 J < \ < e (18) 
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by Chebyshev's inequality. Choose K t = [—a e ,a e ], then K e is compact in R and n(K t ) > 1 — e for all 
/i G A(7, q), thus A(7, g) is tight. 

Let B m = [ — -, -M for m = 1, 2, . . . . Let {/^j^L-L be a convergent sequence in A(7, q) with limit 
/io- Since fi n (B m ) > q for every m,n, we have (151 Section 3.1] 



q < limsup/i n (5 m ) < iJLo(B m ), (19) 

and hence 

Mo({0}) = Mo f fl B m ) = lim fi (B m ) > q. (20) 



\m=l 



Moreover, let /(x) = x 2 which is continuous and bounded below. By weak convergence 1 15 Section 3.1], 
we have 

E^o {X 2 } = J fdno < liminf j /d/i„ < 7 . (21) 
Therefore, /Uo € A(7, g), i.e., A(7,g) is closed, and the compactness of A(j,q) then follows. ■ 



Since the mutual information is continuous on V \ 11 Theorem 9], it must achieve its maximum 
on the compact set A(j,q). Hence the capacity-achieving distribution /io exists. 

According to fl7{ Corollary 2], the mutual information /(/i) is strictly concave. It is easy to see that 
A(7, q) is convex. Hence the capacity-achieving distribution /j,o must be unique. 

B. Sufficient and Necessary Conditions 
We denote the finite-power set as 

A(g) = U < 7 <ooA(7,g). (22) 

Let (/)(■) defined in ( fl4| ) be extended to the complex plane. The relative entropy d{x; fi) defined in ( fT6] ) 
can be extended to the complex plane C and has the following property: 
Lemma 2: For any fi £ A(q) and z G C, 

d(z; fi) = [°° <P(y - z) log ^"^ dy (23) 
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is a holomorphic function of z on C. Consequently, d(x; fi) is a continuous function of x on M. 

Proof: It can be shown that (j>{y — z) log <p(y — z)dy is a constant, thus a holomorphic function 
of z on C. Therefore, it remains to prove that 



is a holomorphic function of z on C. 
First, by Jensen's inequality, we have 



4>{y - z)logp Y (y,n)dy 



(24) 



2vr 



(25) 
(26) 
(27) 



where a = -E /t {X} and b = \ (E M {X 2 } + log(27r)) are real numbers due to the fact that fi £ A(q). 



Thus, py{y\n) G [e~ 



\y 2 -ay-b 



, 1], i-e., 



logP y (y;/i)| < -y 2 + oy + 6. 



(28) 



As a result, we have 



\4>(y - z) log p Y (y,fJ-)\ < 



e 2 



1 



-y +ay + b 



1 (y-Re(^)) 2 -Im 2 (z) / 1 



2vr 



-y + ay + b , 



(29) 
(30) 



which is integrable. (Here Re(z) and Im(z) represent the real and imaginary parts of z, respectively.) It 
follows that £(z) given by ((24]) exists for any \i € A(q) and z € C. 

Suppose J7 is an open and bounded subset of C. There exists an r > such that |Re(z)| < r and 
|Im(z)| < r for all z € U. It is easy to check that 



< e -\+\yA 

< e-^+y + e-'Sr-w 



(31) 
(32) 
(33) 
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Combining d29b and d33]> yields that 



\4>(y - z) log p Y (y;^)\ < 



e -\iv-r) 2 +e -liv+r) 2 



l 



-y +ay + b 



(34) 



which is integrable. Therefore, the integral <fi(y — z) logpy(y; /i)dy is uniformly convergent for all 
z € U. Moreover, cj)(y — z) log py (y; /i) is a holomorphic function of z on U for each y G R. According 



to the differentiation lemma |18|, £(z) is a holomorphic function of z on ?7. It then follows that it is 
holomorphic on the whole complex plane C. Lemma [2] is thus established. ■ 
Let F(fi) be a real-valued function defined on the convex set A(q) and (jlq £ Define the weak 

derivative of F(/d) at /io as 

F ((1 - 0)^ + 0/z) - F(/x ) 



F' ( At )= lim 



o+ 



(35) 



whenever the limit exists. The following result, which finds its parallel in (6j, (9J, |10| gives the weak 
derivative of the mutual information function I(fi). 

Lemma 3: Let /jq, G A(g), the weak derivative of the mutual information function /(//) at //q is 



j L(m) = / d(a;;/io)/i(dx) -J(ju )- 



(36) 



Proof: Define /xg = (1 — #)/io + f° r ai l # £ (0, 1]. It can be shown that 

- (I(/ze) - I(mo)) = 5 / (d(:c; Me) - Mo)) Me( d z) + \[ ( d (x; no) He(dx) - I(no] 



PY (y; He) log PY ^' ^ dy + / ci(x; /x ) M( d ^) - Amo)- 



(37) 
(38) 



Therefore, it suffices to show that 



km / -py (y; /x ) log — ? -dy = 0. 



(39) 



>y(y;Mo) 

In the remainder of this proof, we find a function independent of 6 that dominates the integrand so 
that dominated convergence theorem can be used to establish ([39]) by exchanging the order of the limit 
and the integral therein. 
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Lemma 4: Let 9,a,b G (0, 1]. Define 



then 



(40) 



< 6 + a-61og6-61oga. 
Proof: It is easy to check that f(l) = 6 log |, /(0 + ) = b — a and 



Define 5 (0) = 6>(6 - a) - alog (l - B + \B) for 9 G (0, 1], then we have 

_ g(b-a) 2 
5 W " (l-fl)a + 6»6 " °- 



(41) 



(42) 



Since y(0+) = 0, g(0) > for all 9 G (0, 1]. According to @, we have /'(0) 
that for all G (0,1], 

b 



6-a = /(O + )</(0)</(l) = 61og- 



(43) 

> 0. It follows 
(44) 



and hence 









j|6- a\, 


b log - 


} (45) 




a 





< b + a — b log 6 — 6 log a. 

Lemma [4] is thus established. 

Applying Lemma |4] with a = py(y; /Uq) an d 6 = py (y; //), we have 



zi?y(y; Me) lo~ 



(46) 



, <PY(y;v)+PY(y;no) 
my; mo) 

py(y; /z) log py(y; //) - py(y; /x) log py(y; xx ) 



(47) 



where the right hand side is an integrable function of y by the result that — py (y; ZX2) log Py{v] Mi)dy < 
00 for any /Ui,/Z2 G A(g). In fact, as in the proof of Lemma [2] (see ([28])), there exist a, 6 G R such that 



13 



logpy(y; < \y 2 + ay + b. Therefore, 



CO 



|py(y;M2)logpy(y;w)l d 2/ < J Pv(y, ^2) i^y 2 + ay + b) dy (48) 

= ^E M2 {X 2 }+aE M2 {X} + 6+^ (49) 
< 00 (50) 

due to the assumption that H2 £ A.(q). 

Therefore, the dominated convergence theorem provides that 

Jim - / py(y;/x e )log — , rdy = / lim -p y (y; log — ? rdy (51) 

e^+ej_ OQ PY(y;Ho) J_ 00 e^o+6 I PY(y;Ho) 

(PY(y,Li) -py(y;/x ))dy (52) 
= 0. (53) 

Lemma [3] is thus proved. ■ 
We establish the following sufficient and necessary condition for the optimal input distribution. 
Lemma 5: Let 

f x (x; h) = d(x; /x) - 1(h) - \(x 2 - 7). (54) 

Then ho G A (7, g) achieves the capacity if and only if there exists A > such that AE Mo {X 2 — 7} = 
and {/a(X; Mo)} < for all h £ A(q). 
Proof: Define the Lagrangian 

J(h) = I( f i)-\E IM {X 2 - 1 } (55) 

where A is the Lagrange multiplier. Since A(q) is a convex set and 1(h) < 00 on A(q), Ho is capacity- 
achieving if and only if there exists A > such that the following conditions hold (T9j: 

(i) AE„ {X 2 - 7} = 0; 

(ii) for all h G A(q), J(ho) > J(ji). 

Due to concavity of 1(h), J(h) i s a l so concave. Condition (ii) is then equivalent to that the weak derivative 
j;>)<0for aU M eA(g). 
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By Lemma[3j the linearity of E M {X 2 — 7} with respect to (w.r.t.) \i and Condition (i), J'^iji) can be 
easily calculated as 

j; o ( M ) = E M {/ A (X; Mo )}. (56) 

Therefore, Condition (ii) is equivalent to E^ {/ A (X; /io)} < f° r all n G A(g). Thus Lemma |5] follows. 

■ 

We call i£la point of increase of a measure n if /i(O) > for every open subset O of E containing 2. 
Let be the set of points of increase of /j,. Based on Lemma[5} we derive another sufficient and necessary 
condition for the optimal input distribution, which will be used to prove Property (c) of Theorem [T] in 
Section HV-Cl 

Lemma 6: Let 

g x (x; n) = qf\(0; /i) + (1 - q)f\{x; fx). (57) 

Then fio G A (7, q) achieves the capacity if and only if there exists A > such that for every 

gx(x;n o )<0. (58) 

Furthermore, g\ (2; //q) = for every x G S , /Uo \{0}. 

Proof: The necessity part is shown as follows. Suppose (j,q achieves the capacity, then by Lemma [5] 
there exists A > such that AE Mo {X 2 - 7} = and E M {f\(X; fj, )} < for all \i G A(q). For any 
x G M\{0}, choose \i such that /x({0}) = q and //({a;}) = 1 — g, so by the fact that /u G A(g), we have 

0>E M {/ A (X;/i )} = g/A(0;/xo) + (l-9)/A(^;^o)- (59) 



Due to the continuity of /io) by Lemma [2j f\(x;/j,o) is also continuous so that (59l holds for all 
i.e., g\(x; fj,o) < for every 
To finish proving the necessity, it suffices to show that g\(x; = for all x G S' Ato \{0}. Evidently, 
g x {0; no) = f\(0; Mo) and by <T7) and AE Mo {X 2 - 7} = 0, 

fx(x; no) /JL (dx) = 0. (60) 
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Hence, 

gx(x; fio) /x (dx) = j g x (x; fj, ) fJ>o(dx) - g\(0; fi )no({0}) (61) 

> qf\{Q; mo) + (i - q) J h(x; vo) vo(dx) - qfx(o-,^o) (62) 

= O. (63) 



Since g\(x;[io) < for every x G E, ( |63[ > implies that on M\{0}, g\(x; no) = //o-almost surely, so 
that g\(x; hq) = for all x G 5' Alo \{0} follows immediately. 

The sufficiency part of Lemma [6] is established as follows. Suppose g\(x; /io) < for every i 6l. 
By integrating g\(x; hq) w.r.t. fiQ, we have 



qgx{0;no)> J gx(x; Ho) Vo(dx) (64) 
= ^A(0;/io)-(l-g)AE Ato {X 2 - 7 } (65) 
>^A(0;/i ) (66) 



where ( [65] ) is due to (T7\ and <7a(0;//o) = /a(0;//q)> and ([66]) follows from E Mo {X 2 } < 7 since 



//o £ A.(t> Hence, AE Mo {X 2 — 7} = due to the fact that q < 1. Furthermore, for any /i G A(g), by 
integrating g\(x;/j,o) w.r.t. ^, we have 



qg\{0;no)> J g x (x; fio) n(dx) (67) 
= 9/a(0; Mo) + (1 - 9)E M Mo )} • (68) 

Because g\(0;no) = /a(0;/x ), we have E M /i )} < 0. Together with AE Mo {X 2 - 7} = and 

Lemma [5] this implies that /io must be capacity-achieving. ■ 

C. Discreteness of fiQ 
With Lemma [6] established, we now prove Property (c) in Theorem [T] 

Let A > satisfy condition ([58]) and d(z;fi) be defined in p3| ). We extend functions f\(x;n) in 
Lemma [5] and <7a(^; ^) in Lemma [6] to be defined on the whole complex plane C as ([54]) and ( |57| ), 
respectively, with x replaced by z G C. By Lemma [2j /i) is a holomorphic function of z on C, 
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hence so is g\(z; //). According to Lemma [6] each element in the set 5 /io \{0} is a zero of the function 

Next we show that for any bounded interval L of E, f] L is a finite set. Suppose, to the contrary, 
S^o p| L is infinite, then it has a limit point in E by the Bolzano-Weierstrass Theorem |l8| and hence, 



5a (2; /xo) = on the whole complex plane C by the Identity Theorem [20]. Then, by ( fT6| ), ([54]) and ((57]), 
for every 



/oo 
0(y - x)r(y)dy 
-00 



(69) 

J—oc 
where 

r(y) = logpy (y,no) + + c ( 7 °) 

and c = 2 log(27re) + I (no) — jz^d(0) — A (7 + 1) is a constant. 

As in the proof of Lemma [2J there exist a,b 6 K such that | logpy (y; /Uq)| < \y 2 + ay + b. As 
a result, there exist some a, (3 > such that |r(y)| < ay 2 + /3. Since the convolution of r(y) and 

the Gaussian density is equal to the zero function by (|69]), r(y) must be the zero function according 



to 1 10 Corollary 9]. This requires the capacity-achieving output distribution py(y;Mo) be Gaussian, 
which cannot be true unless X is Gaussian, which contradicts the assumption that X has a probability 
mass at 0. Therefore, f] L must be a finite set for any bounded interval L, which further implies that 
5 Mo is at most countable. 

Finally, we show that is countably infinite. Suppose, to the contrary, = {xi}f =l is a finite set 
with /io({xj}) = pi and |x*| < B\ for all i = 1, 2, . . . , N. For any y > B\, 

N 



E_ (y — B i) 
Pi4>(y - Xi) < e 2 . (71) 



For any e > 0, choose B 2 > such that /_fs <A(a;)dx > 1 - e. By (JT6j, ((54]), ((57]) and ([58]), for any 



17 



x > Bi + B2, we have 



oc 



0>-/ </>(y - x)logpy(y;/i )dy - Aj: - (c + A) (72) 
1 



00 

> / ^-^^(y-S^dy-A^-^ + A) (73) 



B 2 



= / (p(t)-(x- B! + t) 2 dt- \x 2 - (c + \) (74) 
> ^{x-B 1 ) 2 {l-e)-Xx 2 -(c + X). (75) 



For ((72]) to hold for large x, A must satisfy A > \. 

To finish the proof, it suffices to show that A < | for any 7 > 0, so that contradiction arises, which 



implies that S* Mo must be countably infinite. For fixed q G (0, 1), denote the Lagrange multiplier in ( [58] ) 
as A(7). Denote Cq{i) = \ log(l + 7), which is the channel capacity of a Gaussian channel with the 
average power constraint only. By the envelope theorem fl9] , A(7) is the derivative of C(7, q) w.r.t. 7. 
Since C(0,q) = Cg(O) = and the derivative of Cc{l) at 7 = is g, we have A(0) < \, otherwise 
we could find a small enough 7 such that C(^,q) would exceed Ca(j) which is obviously impossible. 
Next we show that C(j, q) is strictly concave for 7 > 0. Suppose \x\ and \i2 are the capacity-achieving 
input distributions of (|5]) for different power constraints 71 and 72, respectively. Due to Property (b) in 
Theorem [T] [i\ and 112 must be different. Define [iq = 6>/ii + (1 — 9)^.2 for 9 G (0, 1). It is easy to see 
that fiQ satisfies that the duty cycle is no greater than 1 — q and the average input power is no greater 
than #71 + (1 — 9)j2- Now we have 

C(0 7l + (1 - 0) 72 , q)>Kjie) (76) 
>0I<j il ) + (l-O)I(jJt 2 ) (77) 
= 0C*( 7l ,g) + (l-0)C(72,g), (78) 



where ( |77] > is due to the strict concavity of /(/u). Therefore, the strict concavity of C(j,q) for 7 > 



follows, which implies that A(7) < A(0) = \ for all 7 > 0. 
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V. Proof of Theorem [2] (the Case of Realistic Duty Cycle Constraint) 

A. Stationarity of the Capacity-achieving Input Distribution 

We first establish the fact that a stationary distribution achieves the capacity of the AWGN channel 
with the realistic duty cycle constraint and power constraint. 
Proposition 1: A stationary distribution^] achieves 

max I(X n ;Y n ). (79) 

MGA"(7,<j.c) 

Proof: Let Ifc(-) as a fc-cyclic-shift operator on p G A n (7,g, c), defined as 

Tk{lJ) = V>x h+1 ,- ,x n ,x lt -x k (80) 

where k = 1, • • • , n — 1, and specifically To(/u) = //. For any distribution p in A n (7, g, c), a distribution 
on X n can be defined as 

^ n— 1 

f = - VW (81) 
n ^-^ 

k=0 

According the concavity of the mutual information /(•), 

J W = / U ( 82 ) 

^ n— 1 

>-J>(T fc (») (83) 
fc=o 

= Kji) (84) 

where I(Tk(p)) = I(p) since the AWGN channel ([!) is a memoryless and time-invariant. Obviously v is 
a stationary distribution and satisfied the duty cycle constraint and power constraint, i.e., v G A n (7, q, c), 
hence Proposition [T] established. ■ 

'The stationarity of distribution v on X n satisfies 

v Xs> - ,x t = vx s+k ,- ,x t+k 

for any index s, t, k satisfied 

l<s<t<n l<s+k<t+k<n 
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According to Proposition [T| for any n, I(X n ;Y n ) is maximized by a stationary distribution. Therefore 
with n converges to infinity, the capacity in (|9]) is achieved by a stationary input distribution. 

B. The Input-output Mutual Information 

Proposition 2: Let the input follows a stationary distribution p, £ A(^y,q, c). The limit of the input- 
output mutual information per symbol as a function of fj, can be expressed as 

I(jji) = I(X;Y)-h(Y) + h(&) (85) 

where I(X; Y) is the mutual information of the AWGN channel between the input X, which follows 
distribution p, Xl and the corresponding output Y, h(Y) is the differential entropy of Y and h{^) is the 
differential entropy rate of output process {Y{\. 

Proof: The mutual information between X n and Y n can be expressed using relative entropies 

J(X«;F") = D(Py n]x 4Py n \P Xn ) (86) 
= D(P Yn \ x 4P Yl x • • • x PyJPxn) - D(PY4P YlX ... xPYn ) (87) 

n ( n *\ 

= ^D(P YklXk \P Yk \P Xn ) - E logP Y ,(Y n ) -^ogPy^Yi) (88) 
k=\ I i=i ) 

= nI{X;Y) -nh(Y) + h{Y n ). (89) 

Then 

I(fj,) = lim -I(X n ;Y n ) (90) 

n— >oo n 

= I(X; Y) - h(Y) + lim -h(Y n ) (91) 

n— >oo n 

= I(X; Y) - h(Y) + h{&). (92) 

Proposition |2] is established. ■ 
When the input is an i.i.d. random process, the output process is also i.i.d., h(Y) = h(W). This implies 
the following corollary. 

Corollary I: Among all i.i.d. distributions, the one that maximizes the mutual information under duty 
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cycle constraint (g, c) and average power constraint 7 can be solved from the following optimization: 

maximize I(X:Y) 

Px 

subject to P x (0)-2cP x (0)(l-Px{0))<q, (93) 
E{X 2 } < 7. 

In the special case of no transition cost, i.e., c = 0, the result of (93 1 is equal to that of (|5]). 



C. Proof of Theorem [2] 



The mutual information expressed by ( 85 1 is hard to optimize, even if the input is restricted to Markov 



processes. To simply the matter, we introduce a lower bound of I(p), which is given by L(fi) in (111. 

Property ( a ): Using the fact that processing reduce relative entropy and n is specified as a stationary 
probability distribution, we have 

-D(P yn \\P Yl x P Y2 x • • • x P Y J < -D{P X ~ \\P Xl xP X2 x-..xP x J (94) 
n ' n 



-^D(P Xklx ,jP Xk \P xj:+i ) (95) 

n 

(96) 



Therefore 



n 
k=2 



lim -D(P Xn \\P Xl xP X2 X • • • XP X J = I(X 1 ; X 2 °°) (97) 

n— >oo fl 



using the fact that the Cesaro mean of sequence /(Xl, Jff) is J(Xi;X|°). Applying (85 1, (87) and (97 1, 



L(/i) = I(X; Y) - I(X V ,X?) < I(ja) < C( 7 , q, c). (98) 

Thus Property (a) is established. 

Property (b): For any \i G A(j,q,c), which is not Markov in general, its first-order Markov 
approximation v is defined by 

vx u - ,x n = fJ'X 1 (J'X 2 \x 1 Hx 3 \x 2 ■ ■ ■ Mx„|x n _i- (99) 
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Evidently, v and \i have identical marginal distributions: vx % = HXi, and also identical joint distributions 
of any consecutive pairs: vx u x i+1 = fJ>x it x i+1 - Therefore 



^({0}) = ^x,({0}) 



(100) 



and 



vx t ,x i+ i({xi = 0,x i+ i / 0}) = fJ-Xi,x i+1 ({xi = Q,Xi+i / 0}. 



(101) 



Since \i € A(7, q, c), we have v G A(7, q, c). Let {Xi} follow distribution fi and {Zi} follow distribution 
v. Then 



I{Z i; Zf ) = I(Z ri Z 2 ) + J(Zu Zf\Z 2 ) 
= I(Z 1 ;Z 2 ) 
= I(X 1 ;X 2 ) 
< J(Xi;X 2 °°) 



(102) 
(103) 
(104) 
(105) 



where equality holds if and only if {Xi} is a first-order Markov process. By ( 1 1 ) and ( 105 ), L{v) > L(/z). 



So for any fi which maximizes L(fi), v can be generated from \i by (99) with L{y) > L(n). L(/j,) must 
be maximized by a first-order Markov process. 

Property (c): Suppose v is a stationary fist-order Markov process, sufficiently denote as v = 
{X, -Px 2 |Xi}j where X is the state space of v and Px 2 \Xi * s tne transition probability distribution. 
Define a new first-order Markov process v from v as follows. 

Definition 1: Let v, defined on the same state space X as v, be a first-order Markov process denoted 
by (X,P Z2 \ Z J, where 



z 2 \z 



1-/3 
1 — a 



v 



zi = z 2 = 0, 
P x (z 2 ) Zl = 0z 2 ^0, 



(106) 



zi^0z 2 ^0, 
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where 



Si = X \ {0} 



(107) 



and 



a = P Xa \xM<>) 

fi = P{X 2 G Si\X x € Si) 

rj = P(X€S 1 ). 



(108) 
(109) 
(110) 



The process v is described by (X,a,f3, Px)- It is easy to prove that the stationary distribution Pz of 
v is equal to Px of v, v e A(7,g, c). Moreover, P satisfies the same power and duty cycle constraint z/ 
satisfies, i.e., v € A(7, g, c). Furthermore let B{ = l{ Xi ^o}> tnen 



B 2 \B 



,(010) = a 



Pb 2 | Bi (1|1)=/3. 



(Ill) 
(112) 



Let 6j = lr z .^ \. Since 



Px(z 2 ) 

PbAK 



(113) 



Zi and Z{ + i are independent given B{ = lr z .^ \ and -Bj+i = l{z i+1 ^o}- 
Based on ( 1Q6] > to ( 113) ), it is easy to see that 

Pz 2 \Z 1 ( Z 2\Zi) 

Pz 2 (Z 2 ) 
Pb 2 \b 1 ( b 2\Bi 



I(Z i; Z 2 ) = E jlog 

= E llO! 



Pb 2 (B 2 ) 



= I(B 1 ;B 2 ) 
<I(X 1 ;X 2 ). 



(114) 
(115) 

(116) 
(117) 



The inequality in ( 117 ) follows since Xi — > X 2 — > B 2 forms a Markov chain then I(Xi; B 2 ) < I{X\\X 2 ) 



1 21 1 and B 2 ->• X 1 -)• Bi also forms a Markov chain then 7(5 2 ; £i) < I(B 2 ;Xi) 
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The discreteness of the optimized input distribution is proved in the following. According to Properties 
(b) and (c), lower bound L(-) is maximized by a first-order Markov process, the transition probability 
distribution of which Px 2 \x x can be expressed as 

Px a \xM*l) = PB 2 \BMh)^^\ (118) 

where b{ = lr x .M Px = ^x and Px 2 \x ± = L l x 2 \x 1 - Then the maximum of L(fi) can be achieved by 
the follow optimization 

maximize Ix(qo) ~ -Tb(?o) (119) 
go 

subject to Ix(qo) = maximize I(X;Y) (120) 

Px 

Ib (qo) = minimize I(Bi;B 2 ) (121) 

P(B 2 \B t ) 

Px(0) = P Bl (0) = P B2 (0) = q (122) 
go - 2^0^1^(110) > g. (123) 

Since given any qo > q > 0, Ix(qo) — Ib{qo) can be maximized by the maximum of Ix(qo) and the 



minimum of Ib(qo) respectively, the maximization of (119) must be achieved by Px, which maximizes 



I(X; Y) for given qo. Therefore given qo, the maximization in ( 120 1 is similar to the problem in Theorem 



[I] The difference to Theorem [I] is that in (120) the distribution Px satisfies Px(0) = Qo > Q, however 



in Theorem [T] the distribution Px satisfies Px(ty > Q- Define 

Ao(7, Qo) = V ■ M({0}) = qo, E M {X 2 } < 7} (124) 

where fx is the marginal input distribution of the first-order Markov process. We can establish the following 
lemma. 

Lemma 7: Ao(7,go) is compact in the topological space V. 
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Proof: As mentioned in Lemma [T] the topology of weak convergence on V is metrizable with the 
Levy-Prohorov metric [15] and defined as 

L(ji, v) = inf {5 :n(A) < v{A {6) ) + 6 and 

v(A) < fi(A {S) ) + 5 for all A C B} (125) 

for any fj,,u EV, where A^ denotes the set of all x G R which lie a d-distance less than 5 from A. 

Similarly as in the proof of Lemma [I] it suffices to show that Ao(7,<7o) is both tight and closed in 
V. The tightness can be shown by the same arguments as in Lemma [T] In the following, we prove that 
Aq(7, qo) is closed in V. 

Let B m = [— -js -M for m = 1, 2, Let {/Lin}^^ be a convergent sequence in Ao(7, qo) with limit 

/io- For any m G N, there exists an n m such that L(/j, n , Mo) < ^ f° r ai l w > n m . By the definition of L 



in (125), we have for any m G N and n > n m , 



and 



W)({0}) < /in(Sm) + - , (126) 

m 



/Un({0}) < fi (B m ) + - . (127) 
m 



For any n G N 1J{0}, we have 

Ai«({0}) = /x n ( nB m ]= lim Mn(S m ), (128) 

\m=l / 

so for any m G N, there exists an n' m such that ji n (B m ) < /i n ({0}) + — . Therefore, according to ( 126 1 



CO 



and (127 1, for all m G N and n > max{n m , n' m }, 



qo~- <Mo({0}) <qo + ~. (129) 

m m 

Thus we have /xo({0}) = go by letting m — > oo. 

Moreover, let /(x) = x 2 which is continuous and bounded below. By weak convergence |15| Sec- 
tion 3.1], we have 

E^o {X 2 } = [ fdfio < liminf / fdfi n < 7. (130) 

J n— >co J 
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Together with ^({0}) = qo, we have /j,q G Ao(7,go)> i- e -> Aq(7, qo) is closed, and the compactness of 
Ao(7,(/o) then follows. ■ 

Now Px can be proved to be discrete by following the same development as in the proof of Theorem [T] 
with Lemma [T] substituted by Lemma [7] Because Px is the stationary distribution of the Markov process, 
the maximum of the lower bound L(-) is achieved by a discrete first-order Markov process. 

Based on Theorem [2j in order to find the lower bound of the capacity, we can maximize L(p) and 
obtain an optimized discrete first-order Markov input p,* = {X , a, /3, Px} in A(<y,q,c). Let po denote 
the capacity-achieving distribution, then 

I(» )>I(fj*)>L(v*)- (131) 



In Section VI- A we develop a computationally efficient scheme to determine p*, which is a good 



approximation of the capacity-achieving input po- 

VI. Numerical methods and results 
A. Computation of the entropy of Hidden Markov Processes 



In order to numerically calculate the mutual information ([85]), it is important to compute the differ- 
ential entropy rate of a HMP generated by Markov input through the AWGN channel. Computing the 
(differential) entropy rate of HMPs is a hard problem. Most works in this area focus on the entropy 
rate of the binary Markov input through various channels. Reference [22] solves a linear system for the 
stationary distribution of the quantized Markov process to obtain a good approximation of the entropy 
rate for the HMP output generated by binary Markov input through a binary symmetric channel. In (23j, 
the entropy rate of HMP generated by binary-symmetric Markov input through arbitrary memoryless 
channels is studied and a numerical method is presented based on quantizing a fixed-point functional 
equation. Based on these existing studies, a Monte Carlo algorithm is provided in this paper to compute 
the differential entropy rate of HMPs generated from a m-state Markov chain (m > 3) through the 
AWGN channel. We sketch the main ideas in our algorithm for computing the differential entropy rate 
in this subsection. 

Based on Blackwell's work | [24j , the entropy of HMPs can be expressed as an expectation on the 
distribution of the conditional distribution of Xo given the past observations Y®^. In order to estimate 
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Px \Y° ■> fi rst define the log-likelihood ratio: 

m = log ; ', -, i = 0,1,- ,m- 1 (132) 

^x„|y"(^ (0) l^ n ) 

where m is the number of the states of Markov Chain, XW g ;f is the ith state and X is the state 
space of Markov Chain. It is obviously that l!f? = 0. Then given L n = {L,ffi , ■ ■ ■ }, 
Px n \Y n (Xn\Y n ) can be calculated as 

Px n \MX^\Y-)= 'T LW (133) 



and when n — > oo, dl33b converges to P Xo (X«| Y" ^). 



In addition, can be calculated from L n iteratively as 



L®=RW{Y n+1 ) + F®{L n ) (134) 



where 



i?«(y n+1 ) = (x« - x< >) Y n+1 - l -({x^f - (x(°)) 2 ) 



Detail deduction of ( |134| > is shown in ( |137[ ) 



° S Er=o 1 ^ +1 |x„ +1 (^ + i|x(o))p X2|Xi (x(o)|xW)p Xn|yin (x«|y 1 ^ 



(135) 
(136) 



° 8 ^ +1 ,x„ +1 (^ + i|X(0)) + °S Er -i p X2|Xi(X (0)|xW) ( ^ 

= fl«(y n+1 ) + i r(0(£ n ). (140) 

For the hidden Markov processes observed through the AWGN channel ([T]), the entropy of HMPs can 
be computed as (24 1 

h(&) = lim - // r(y, l n ) log r(y, l n ) dy dP Ln (l n ) (141) 
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where 



m— 1 m—1 r(«) 

— — e - 



r(y,L n ) = £ tfy-s«) £ p^^^W^W). (142) 



Em— _ 
i=0 e 



In order to compute the entropy rate of HMPs based on ( 141 ), the key is to estimate the probability 



distribution of L n , Pi, n - In |22| for binary Markov input and the binary symmetric channel, L n is 



considered as a 1-dim M-state Markov chain by quantizing the dynamic system expressed in (134i. 
Then the distribution of is the stationary distribution of the quantized Markov process and can 
be computed easily through eigenvector solving method. In this paper because the number of states of 
the Markov input, m is larger than 2 and the HMPs is observed through the AWGN channel, directly 
quantizing the dynamic system (134) will generate a quantized Markov chain with M m_1 states, which 



is very difficult to deal with when large M is selected for good estimation precision. 



According to (134), since L n+ \ is only dependent on L n and Y n+ \, {L n } can be considered as a 



Markov process. In order to compute the stationary probability distribution Pi, , we can evolve the 



distribution of L n based on (134) from any initial distribution Pi Q . When n is large enough, the 
distribution Pi n converges to Pl^- A Monte Carlo algorithm for approximating h{W) is introduced 
as follows: 

1) Initialize M particles {Xo,i> " " " > -^o,m}> -^o,fe can be simply sampled from the (m— l)-dim Uniform 
distribution with each dimension on [— max(XW), max(jW)]. 



2) for n = 0, 1, 2, • ■ ■ , N, iteratively evolve the particles {-ko,l> • • ■ > A),Af} based on ( 134 1, where each 
y n +i,k is sampled according to r(y,L n ,k)- 

3) when N is large enough, {Ljv.fc} can be used to estimate h{&~) as 

i M r 

K&) « -TlYl / r{y,L Ntk )\ogr{y,L Ntk ) dy. (143) 
k=i J 

When M is very large, histogram method can be used to describe {-Ljv,fe} and reduce the computational 
load. 

B. Numerical Results 

1) Idealized duty cycle constraint (q,0): One implication of Theorem [T] is that directly computing 
the capacity-achieving input distribution requires solving an optimization problem with infinite variables 
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which is prohibitive. Assuming any upper bound on the number of probability mass points, however, 
a numerical optimization over the mutual information can yield a suboptimal input distribution and a 
lower bound on the channel capacity. As we increase the number of mass points, the lower bound can 
be further refined. We take this approach to numerically compute a good approximation of the channel 
capacity by optimizing over a sufficient number of probability mass points. 

Given the duty cycle and power constraints, we first numerically optimize the mutual information by 
a 3-point input distribution (including a mass at 0), then increase the number of probability mass points 
by 2 at a time to improve the mutual information, until the improvement is less than 10 -3 . 

First consider the case that the duty cycle is no greater than 70%, i.e., P(X = 0) > q = 0.3. For 
different SNRs, the mass points of the near-optimal input distribution with finite support along with the 
corresponding probability masses are shown in Fig. [T] Due to symmetry, only the positive half of the input 
distribution is plotted. We can see that as the SNR increases, more masses are put on higher-amplitude 
points, whereas the probability mass at zero achieves its lower bound 0.3 eventually. 

In Fig |2| we compare the rate achieved by the near-optimal input distribution and the rate achieved by 
a conventional scheme using Gaussian signaling over a deterministic schedule, which is (1 — q) times the 
Gaussian channel capacity without duty cycle constraint. It is shown in the figure that there is substantial 
gain for both dB and 10 dB SNRs by using discrete input over Gaussian signaling with a deterministic 
schedule. For example, when the SNR is 10 dB, given the duty cycle is no more than 50%, the discrete 
input distribution achieves 50% higher rate. Hence departing from the usual paradigm of intermittent 
packet transmissions may yield significant gains. 

We also plot in Fig [2] the achievable rate by a superposition coding, where the input distribution is 
a mixture of Gaussian and a point mass at 0. We first decode the support of the input to find out the 
positions of nonzero symbols, and then the Gaussian codeword conditioned on the support. It is shown 
in the figure that the near-optimal discrete input achieves higher rate compared with the mixture input. 

2) Realistic duty cycle constraint (q,c): In this subsection the numerical results of lower bound of 
capacity and suboptimal distribution are provided based on the results in Section [V] and VI-A| 



We first seek a discrete Markov chain with finite alphabet that maximizes the objective L(/j,) defined 
in ( [TTj ). Once the optimal Markov distribution \i* is determined, we compute the achievable rate I(fi*) 
according to ( |85| ). 
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Fig. 1. Suboptimal input distribution for P(X = 0) > q = 0.3. 
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X 1 -3.9281 
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0.0687 


Px 


0.7481 


0.0919 


0.0919 


0.0341 


0.0341 



TABLE I 

Px 2 \x 1 AND P x FOR q = 0.5, c = 1.0, SNR = 8dB. 



In this paper // = (X ,a, (3, Px) is used to approximate the optimum distribution //o through the 
maximizing L(-). It is obvious that the optimized [i* is symmetric about 0. Table [i] is the transition 
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probability matrix Px^lXi an d stationary probability Px for q = 0.5, c = 1.0 and SNR = 8 dB. The 
symmetry of the transition probability matrix is evident, as conditioned on that two consecutive symbols 
are nonzero, they are independent. 

Fig. [3] shows the stationary (marginal) distribution for suboptimal Markov input. In order to compensate 
the transition cost, additional fraction of zero symbol should be transmitted, Px(0) > q- As the SNR 
increases, more and more weights are put on distant constellation points, where less and less weights are 
put on the zero letter. 

In Fig. |4j the rates achieved by various optimized input distributions are plotted against the SNR. The 
rate achieved by the optimized Markov input is larger than that of suboptimal i.i.d. input calculated by 
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Fig. 3. The marginal distribution of the stationary Markov input. Duty cycle < 0.5, transition cost c = 1.0. 



formula (93 1 with duty cycle constraint (q, c). The lower bound L(/u) is quite tight and can be regarded 
as a good approximation of mutual information of first-order Markov inputs. 

Figs. [5] and [6] demonstrate the sensitivity of the achievable rates to the duty cycle parameter q and the 
transition cost c, respectively. The performance of Markov inputs is superior to i.i.d. inputs as well as 
Gaussian signaling with deterministic schedule. Fig [5] shows that the performance of i.i.d. input is similar 
to the deterministic schedule, which implies that different from the case under the idealized duty cycle 
constraint, i.i.d. input is not a good choice under the realistic duty cycle constraint. 
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Fig. 4. The achievable rate vs. the SNR. Duty cycle < 0.5, transition cost c = 1.0. 



VII. Concluding Remarks 

In this paper we have studied the impact of duty cycle constraint on the capacity of AWGN channels. 
Under the idealize duty cycle constraint, the optimal distribution has an infinite number of probability 
mass points in a bounded interval. This allows efficient numerical optimization of the input distribution. 
Under the realistic duty cycle constraint, the capacity-achieving input is hard to compute. We develop 
techniques for computing a near-optimal input distribution. This input takes the form of a discrete first- 
order Markov process, which matches the "Markov" nature of the duty cycle constraint. The numerical 
results show that under the duty cycle constraint, departing from the usual paradigm of intermittent packet 
transmissions may yield substantial gain. 
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Fig. 5. The achievable rate vs. the duty cycle, SNR = 10 dB and transition cost c = 1.0. 
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